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We study the persistent recoverable prevalence and the extinction of computer viruses via e-mails 
on a growing scale-free network with new users, which structure is estimated form real data. The 
typical phenomenon is simulated in a realistic model with the probabilistic execution and detection 
of viruses. Moreover, the conditions of extinction by random and targeted immunizations for hubs 
are derived through bifurcation analysis for simpler models by using a mean-field approximation 
without the connectivity correlations. We can qualitatively understand the mechanisms of the 
spread in linearly growing scale-free networks. 
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I. INTRODUCTION 

In spite of the different social, technological, and bi- 
ological interactions, many complex networks in real- 
worlds have a common structure based on the universal 
self-organized mechanism: network growth and preferen- 
tial attachment of connections [3] [4]. The structure is 
called scale-free (SF) network, which exhibits a power- 
law degree distribution P(k) ~ fc~ 7 , 2 < 7 < 3, for the 
probability of k connections. The topology deviates from 
the conventional homogeneous regular lattices and ran- 
dom graphs. Many researchers are attracted to a new 
paradigm of the heterogeneous SF networks in the active 
and fruitful area. 

The structure of SF networks also gives us a strong 
impact on the dynamics of epidemic models for com- 
puter viruses, HIV, and others. Recently, it has been 
shown [19] that a susccptiblc-infected-susccptiblc (SIS) 
model on SF networks has no epidemic threshold; infec- 
tions can be proliferated, whatever small infection rate 
they have. This result disproves the threshold theory 
in epidemiology [22]. The heterogeneous structure is 
also crucial for spreading the viruses on the analysis of 
susceptible-infected-recovered (SIR) models [14] [17]. In 
contrast to the absence of epidemic threshold, an im- 
munization strategy has been theoretically presented in 
SIS models [7] [21]. The targeted immunization applies 
the extreme disconnections by attacks against hubs with 
high-degrees on SF networks [1] to a prevention against 
the spread of infections. 

In this paper, we investigate the dynamic properties 
for spreading of computer viruses on the SF networks 
estimated from real data of e-mail communication [15]. 
As a new property in both simulation and theoretical 
analysis, we suggest a growing network with new e-mail 
users causes the recoverable prevalence from a tempo- 
rary silence of almost complete extinction. The typical 
phenomenon in observations [11] [23] is not explained by 
the above statistical analysises at steady states or mean- 
values (in the fixed size or TV — > 00). We first consider, 
in simulations, a realistic growing model with the proba- 



bilistic execution and detection of viruses on the SF net- 
work. Then, for understanding the mechanisms of re- 
coverable prevalence and extinction, wc analyze simpler 
growing models in deterministic equations. By using a 
mean-field approximation without the connectivity cor- 
relations, we derive bifurcation conditions from the ex- 
tinction to the recoverable prevalence (or the opposite), 
which is related to the growth, infection, and immune 
rates. Moreover, we verify the effectiveness of the tar- 
geted immunization by anti-viruses for hubs even in the 
growing system. 



II. E-MAIL NETWORK 



The state transition for infection 



We consider a network whose vertices and edges rep- 
resent computers and the communication via e-mails be- 
tween users. The state at each computer i = 1, . . . , N is 
changed from the susceptible, hidden, infectious, and to 
the recovered by the remove of viruses and installation 
of anti-viruses. We make a realistic model in stochastic 
state transitions with probabilities of the execution and 
the detection of viruses. Fig. 1 shows the state tran- 
sitions, where A and S denote the execution rate from 
the hidden to the infectious state and the detection rate 
from the special subjects or doubtful attachment files. 
The probability at least one detection from the rij viruses 
on the computer is 1 — (1 — S) ni , and the probability at 
least one execution is 1 — (1 — A) ni . We assume the in- 
fected mail is not sent again for the same communication 
partner (sent it at only one time) to be difficult for the 
detection. Thus, n, is at most the number of in-degree 
at each vertex. In the stochastic SHIR model, the final 
state is the recovered or immune by anti- viruses, if at 
least one infected mail is received. 
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FIG. 1: S-H-I-R state transition diagram. 
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FIG. 2: Power-law degree distributions with the exponents 
Jout = 2.5 and ji n = 1.9 for (a) sent-mails and (b) receive- 
mails between users including the internal and the external 
[15] . The frequency at degree k is counted in the interval 
between [k, k + 10], except of the outer of more than 100 
degree at k — 200. 



B. The scale- free structure 

We show the network structure based on real data mea- 
sured by questionnaires for 2,555 users in a part of World 
Internet Project 2000 [15]. The distributions of both 
sent- and receive-mails follow a power-law in Fig. 2, the 
parameters are estimated as j out = 2.5, ji n = 1.9, and 



the average number of mails par day k = 5 ~ 20. These 
values arc close to the exponents j ou t = 2.03 ± 0.12 and 
ji n = 1.49 ± 0.12 [9] estimated for the server log files of 
e-mails [24]. In addition, the cumulative histograms of 
less than degree k in Fig. 3 (a) have similar shapes to 
them in a larger network of e-mail address books [18]. 
The solid lines in Fig. 3 correspond to non-cumulative 
distributions of the in-degree and out-degree estimated 
as stretched exponential 

(a) P m (k) ~ fc- 197 xexp(-72.26xfc- 316 - 23 ), P out (k) ~ 
fc" 2 - 29 x exp(-56.98 x fc~ 247 - 39 ), 

(b) P in (k) ~ fc- 1 - 82 x exp(0.63 x fc- 52 - 99 ), P out (k) ~ 
fc- 2 02 x cxp(0.6 x fc- 59 - 23 ), 

(c) P m (k) ~ fc" 1 - 75 x cxp(-0.71 x fc- 53 - 42 ), P out (k) ~ 

fc" 1 - 39 x exp(-1.43 x fc" 137 - 95 ), 

(d) P m (k) ~ fc- 2 - 87 x exp(3.54 x fc~ - 048 ), P out (k) ~ 
fc- 2 - 49 x exp(3.23 x fc" 002 ). 

In all of them, the factor of power law as a scale-free 
network is dominant. Note that the in-degree distribu- 
tion in Fig. 3 (d) is the most close to the exponential 
distribution in [18] with a strong cut-off, and that both 
data consists of only the internal networks. However, as 
in [6] [18], we must further discuss about the reason why 
exponential in-degree distribution appears in only the in- 
ternal networks. This is beyond the scope of this paper. 
The non-exponential distributions may be caused by the 
limited size of the sample, or by that the eliminated links 
from the external nodes have an impact on the generation 
of hubs in a scale-free network. 



C. The (a, f3) model 

With the estimated parameters, we generate a SF net- 
work for the contact relations between e-mail users, by 
applying the simple (a, j3) model [13], in which the slopes 
of power-law ji n w and jout ~ are controlled by 
the a-f3 coin in Table I (in the case of e-mails a — 0.4736 
and (3 = 0.6). Growing with a new vertex at each step, 
fc edges are added as follows. As the terminal, the coin 
chooses a new vertex with probability a and an old vertex 
with probability 1 — a in proportion to its in-degree. As 
the origin, the coin chooses a new vertex with probability 
(3 and an old vertex with probability 1 — j3 in proportion 
to its out-degree. According to both the growth and the 
preferential attachment [3] [4], the generation processes 
are repeated until the required size N is obtained as a 
connected component without self-loops and multi-edges. 
The (a, (3) model generates both of edges from/to a new 
vertex and edges between old vertices, the processes are 
somewhat analogous to ones in the generalized BA model 
[2][3]. 
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FIG. 3: The cumulative distributions of the in-degree and 
out-degree for e-mail networks, (a) in the questionnaires [15], 
and (b)-(d) in the server log files [24] for (b) all of the internal 
and external nodes as e-mail users, (c) the internal nodes 



1 — /3 terminal: new, origin: old 



both of old vertices 



TABLE I: Directed edge generation by the a-/3 coin. 



III. SIMULATIONS FOR STOCHASTIC MODEL 

We study the typical behavior in the SHIR model on 
the SF networks. In the following simulations, we set the 
execution rate A = 0.1, the detection rate S = 0.04, the 
average number of edges k = 6.6, and initial infection 
sources of randomly chosen five vertices (the following 
results arc similar to other small values A = 0.2, 0.3 and 
S = 0.05,0.06). These small values are realistic, because 
computer viruses are not recognized before the preva- 
lence and it may be executed by some users. We note 
the parameters are related to the sharpness of increas- 
ing/decreasing infections up/down (<5 is more sensitive). 
It is well known, in a closed system of the SHIR model, 
the number of infected computers (the hidden and in- 
fectious states) is initially increased and saturated, fi- 
nally converged to zero as the extinction. While the pat- 
tern may be different in an open system, indeed, oscil- 
lations have been described by a deterministic Kermack- 
McKendrik model [22]. However a constant population 
(equal rates of the birth and the death) or territorial 
competition has been mainly discussed in the model, the 
growth of computer network is obviously more rapid, 
and the communications in mailing are not competitive. 
Thus, we consider a growing system, in which 50 vertices 
and the corresponded new k edges are added at every 
step, from an initial SF network with N = 400 up to 
20350 at 400 steps. Here, one step is corresponding to a 
day (400 steps « a year). These values of A, <5, k, and 
the growth rate are only examples with something of re- 
ality for simulations, since the actual values depends on 
the observed period are still unknown. As shown in Fig. 
4(a) (b), the phenomena of persistent recoverable preva- 
lence are found in the open system, but not in the closed 
system. 

To prevent the wide spread of infections, we investigate 
how to assign anti-virus softwares onto the SF networks. 
We verify the effectiveness of the targeted immunization 
for hubs even in the cases of recoverable prevalence. Fig. 
4(c) (d) show the average number of infected computers 
with recoverable prevalence in 100 trials, where immu- 
nized vertices are randomly selected or as hubs accord- 
ing to the out-degree order of the 10 %, 20 %, 30 % of 
growing size at every 30 steps (corresponded to a month) . 
The number is decreased as larger immune rates for hubs, 
viruses are nearly extinct (there exists only few viruses) 
in the 30 % as marked by x in Fig. 4(c). While it is 
also decreased as larger immune rates for randomly se- 
lected vertices, however they are not extinct even in the 
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30 % as marked by x in Fig. 4(d). Fig. 5(a) (b) show the 
number of recovered states by the hub and random immu- 
nization of the 30 % (triangle marks) for the comparison 
with the normal detections (rectangle marks). The im- 
munized hubs are dominant than the normal detections 
in Fig. 5(a). However, there is no such difference for the 
random immunization in Fig. 5(b). In the case of the 10 
%, the relation is exchanged; the number of detections 
is larger than that of both hub and random immuniza- 
tion. It is the intermediate in the case of the 20 %. From 
these results, we remark the targeted immunization for 
hubs strongly prevents the spread of infections in spite 
of the totally fewer recovered states than that in random 
immunization. 



IV. ANALYSIS FOR DETERMINISTIC MODEL 

Although the stochastic SHIR model is realistic, the 
analysis is very difficult in the open system. Thus, we 
analyze simpler deterministic SIR models for the spread- 
ing of computer viruses to understand the mechanisms of 
recoverable prevalence and extinction by the immuniza- 
tion. We consider the time evolutions of S(t) > and 
I(t) > (t > 0), which are the number of susceptible and 
infected vertices. We assume that infection sources exist 
in an initial network, and that both network growth and 
the spread of viruses are progressed in continuous time 
as an approximation. In addition, we have no specific 
rules in growing, but consider a linearly growing network 
size and the distribution of connections on an undirected 
connected graph as a consequence. 



A. Homogeneous SIR model 

As the most simple case, in the homogeneous networks 
with only the detection of viruses, the time evolutions are 
given by 

dS(t) 



dt 
dl(t) 
dt 



a, 



= -b<k> S(t)I(t) 
= -5 I(t) + b<k> S(t)I(t), 



(1) 
(2) 



where a > and < b, Sq < 1 denote the growth, 

infection, and detection rates, respectively. < k > = 
kP(k) is the average number of connections with a 
probability P(k). The term S(t)I(t) represents the fre- 
quency of contact relations. Note that the number of 
recovered vertices R(t) is a shadow variable defined by 
^1 = S I(t). From the network size N(t) = S(t) + 
I{t) + R(t), the solution is given by N(t) = N(Q) + at as 
a linear growth. Fig. 6(a) shows the nullclincs of 
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FIG. 4: Typical behavior of the spread on SF networks in (a) 
a closed system and (b) an open system with simultaneously 
progress of both spread of viruses and growth of network. The 
lines show the differences in stochastic state transitions. The 
effects of immunization are shown as the averages in the open 
system for (c) hub and (d) random immunization. The open 
diamond, square, triangle, and cross marks are corresponding 
to the normal detection by the state transitions, immunization 
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FIG. 5: Number of vertices in the recovered state by (a) hub 
and (b) random immunization of the 30 %. Each of them is 
the average value for recoverable prevalence in 100 trials. The 
dashed lines represent the number of vertices that are already 
changed to the recovered states before the immunization. 



for Eqs. (1)(2). The directions of vector filed are de- 
fined by the positive or negative signs of ^ and %. 

There exists a stable equilibrium point (/*,£*), I* = ^. 
The states of S and / are converged to the point with a 
damped oscillation. We can easily check the real parts of 
eigenvalues for the Jacobian are negative at the point. 



B. Heterogeneous SIR model 

Next, we consider the heterogeneous SF networks at 
the mean-field level, in which the connectivity correla- 
tions are neglected [16]. We know that static and grown 
networks have different properties for the size of giant 
component [8] and the connectivity correlations [5] [12] 
even if the degree distributions are the same. In partic- 
ular, the correlations may have influence on the spread, 
however they are not found in all growing network mod- 
els or real systems. We have experientially observed the 
correlations are very week in the (a, (3) model in the 
previous simulations as similar to the nearest neighbors 
average connectivity in the generalized BA model rather 
than the fitness model or AS in the Internet [20]. At 
least, non-correlation seems to be not crucial for the ab- 
sence of epidemic threshold [7] [16] [19] [21], the existence 
of correlations is much still less nontrivial in e-mail net- 
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FIG. 6: Nullclines and the vector fields for (a) homogeneous 
and (b) heterogeneous SIR models. The state in both cases 
is converged to an equilibrium point with a damped oscilla- 
tion, which is corresponded to persistent recoverable preva- 
lence around the non-zero level I* or I£. 



works. Although the mean-field approach by neglecting 
the correlations in macroscopic equations at a large net- 
work size is a crude approximation method, it is useful 
for understanding the mechanisms of the spread in grow- 
ing networks, as far as it is qualitatively similar to the 
behavior of viruses in the stochastic model or observed 
real data. Indeed, the following results are consistent 
with the analysis for correlated cases [10], except of the 
quantitative differences. 

We introduce a linear kernel [12] as Nk(t) ~ afc x t, 
Nk(t) = Sk(t)+Ik(t) + Rk(t), which are sum of the num- 
bers of susceptible, infected, and recovered vertices with 

connectivity k, and the growth rate == Ak~ v , A > 0, 
v > 2. Note that the total N(t) = J2 k N k{t) ~ (J2k a k) x 
t means a linear growth of network size. Since the max- 
imum degree increases as progressing the time and ap- 
proaches to infinity, it has a nearly constant growth rate 
Ell m a k ~ C Ak-"dk = for i arge L As shown 

in [12], the introduction of linear kernel is not contradic- 
tion with the preferential (linear) attachment [3] [4]. 
At the mean-field level in a somewhat large network 



G 



with only the detection of viruses, the time evolutions of 
Sk > and I k > are given by 



dS k (t) 

dt 
dh{t) 

dt 



= -bkS k (t)Q(t) + a k , (3) 

= -s i k (t) + bks k (t)Q(t), (4) 



where the shadow variable Rk (t) is implicitly defined by 



dRk(t) 



5 Q I k (t). The factor 9(t) d = E k c k h(t), c k 



del' 



dt 

, represents the expectation that any given edge 
points to an infected vertex. 

We consider a section of Ik' = I^ '■ const, for all k' ^ k. 
Fig. 6(b) shows the nullclines of 



dS k 
~df 



0: S k 



kbQ kbekh + kb J2k> c ^'^k' 



dh =Q 
dt 



Sk 



8ah 



Soh 



kbQ kbckh + kbJ^k'Ck'Ik'' 



and the vector filed for Eqs. (3) (4). There exists a stable 



equilibrium point (I^Sl) 



def 



(t'fef^)< bcCauSC 0f 



39 



k>7r. 



So 



k-^ +J+1) dk 



So(v + 7) ' 



by using Ck = 7 x m 7 x fc~( 7+1 ) for the generalized BA 
model [16] with a power-law degree distribution P(k) = 
(1 + 7)m 1+7 fc~ 2 ~ 7 , < k >= -^-m, (which includes the 
simple BA model [4] at 7 = 1). On these state spaces in 
Fig. 6(a) (b), only the case of a — or ak — gives the 
extinction: I* = or 1% = 0. It means that we must stop 
the growing to prevent the infections by the detection. In 
addition, the homogeneous and heterogeneous systems 
are regarded as oscillators in Fig. 7(a) (b). 



C. Effect of immunization 

We study the effect of random and hub immunization. 
With the randomly immune rate < S r < 1, the time 
evolutions are given by 



dS k (t) 

dt 
dl k (t) 

dt 



-bkS k (t)Q{t) + a k - 5 r S k {t), 
-S I k {t) + bkS k (t)Q{t) - 5rh(t), 



(5) 
(6) 



where the shadow variable Rk(t) is also defined by 
^ 1 = S I k (t) + S r (S k (t)+I k (t)). 

We also consider a section of 1^ = I k >: const, for all 
k' ^ k. From the nullclines of Eqs. (5) and (6) with 
random immunization, there exists a stable equilibrium 



point (I k , u k 



def 



s k ) = ( 
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FIG. 7: Oscillators for (a) homogeneous and (b) heteroge- 
neous SIR models in the open system. They consist of S-I 
pairs with excitatory: — > and inhibitory: H connections, and 
an input bias a or a k of the growth rate. The factor Q acts 
as a global inhibition or excitation. 



is self-consistent at the point. The condition is given by 



19=0 ~ S r (Sa+S r ) Jm K aK 



S r (S a +S r )( 1 +v-l) 



> 1. 



In this case, the state space is the same as shown in Fig. 
6(b). 

Next, we assume I£, = for all fc' ^ fc to discuss the 
extinction. On the section, the nullclines are 



dS k 
~dT 



0: S k 



ak 



(ik 



S r + kbQ S r + kbcklk ' 



— = • s k = — + 5r " >Ik = So + Sr (I k ^ 0) 

dt kbQ kbek ' 

for Eqs. (5) (6). The necessary condition of extinction is 
given by that the point (0, tM on the nullcline = 



is below the line S k 
the condition 



%tt: const, of ^ = 0. From 

kock at 



/(e*), 



afc S + S, 
8 r kbek 
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dS k /dt = 
ak/5 



dl k /dt = 




dS/dt = 



(b) 

FIG. 8: Saddle-node bifurcation between (a) damped oscilla- 
tion of recoverable prevalence and (b) convergence to the ex- 
tinction by the immunization in the heterogeneous SIR model. 
The state space is changed by the bifurcation parameters 5 r 

and ak for the value of Sk = f • 
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FIG. 9: Non-extinction in (a) homogeneous and (b) hetero- 
geneous SIS models. The number of infected state I or I k 
finally diverges to infinity. 



we obtain 



5 2 o 



Akakbck- 



(7) 



In addition, < 5 r < 1 must be satisfied, it is given by 
a k < from kc k = jm^k^, m < k < oo, 7 > 0, 

for the generalized BA model [16]. In this case, there 
exists a stable equilibrium point, otherwise a saddle and 
a stable equilibrium point as shown in Fig. 8(a)(b). The 
state space is changed through a saddle-node bifurcation 
by values of the growth rate a k and the immune rate S r . 

For the hub immunization [7], S r is replaced by < 
8hk T < 1, r > 0, e.g. r ~ 1 as proportional immu- 
nization to the degree. We may chose the pr times 
smaller immune rate Sh than S r for (7). In other words, 
the necessary condition of extinction in (7) is relaxed to 



a k < 



m T (m T +2i5o) 
467 



Thus viruses can be removed in larger 

growth rate. 

The above conditions are almost fitting to the results 
for the stochastic model in Section III. We can evaluate 
them using the corresponded parameters: m = 1, v = 
2 + 7 = iHd_w = 2.2, b <-> A = 0.1, S <-> 5 = 0.04, S r 
or S h = 0.1,0.2,0.3, r = 1, and A = 60 form (£a fc ) ~ 



f Ak~ u dk = & 
j 1 

find that a k < - 



50. 



. By simple calculations, we 
2<5fl - is satisfied for k > 2. The condition 



4fry 



(7) is satisfied for only k > 5 with random immunization 
of the 30 % and k > 7 with the 20 %, so the extinction 
of viruses is difficult by spreading of infection from many 
vertices with low degree k < 4, whereas it is satisfied 
for k > 3 with hub immunization of both the 20 % and 
30 % by the factor of l/k T . The delicate mismatch at 
k = 1 , 2 may be from the difference of the complicated 
stochastic behavior as in Fig. 1 and the macroscopic 
crude approximation. 



D. SIS model 

Finally, to show the recovered state is necessary, we 
consider the SIS models in the open system. The time 
evolutions on homogeneous networks are given by 



dS(t) 

dt 
dl(t) 

dt 



6 I(t) -b<k> S(t)I(t) + a, (8) 
-5 I(t) + b<k> S(t)I(t), (9) 
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where N(t) = S(t) + I{t). The nullclines are 
5 I + a 5q 



dS 



0: S 



+ 



b < k> I b < k > b < k > I 



different from that in the SIR model. We can not realize 
both of the extinction and the recoverable prevalence of 
viruses on the SIS model, in any case, even in the open 
system. 



for Eqs. (8) (9). There exists a gap of b< 1 >1 > even in 
I* — > oo. Furthermore, the time evolutions on heteroge- 
neous networks are given by 



dS k (t) 

dt 
dh{t) 



S I k (t) - bkS k (t)0{t) + ok, (10) 

-s Q i k (t) + bks k (t)e(t). (ii) 



dt 

On a section I k r. const., the nullclines are 
dSk _ „ q S I k + a k 

"dT~ U: bk " kbG ' 



dlk _ „ q S I k 

dt " U 1 kbQ 

for Eqs. (10)(11). There also exists a gap between the 
nullclines. Fig. 9(a) (b) show the nullclines and the vec- 
tor filed. Thus, the dynamics in the SIS model is quite 



CONCLUSION 



In summary, we have investigate the spread of viruses 
via e-mails on linearly growing SF network models whose 
exponents of the power law degree distributions are es- 
timated from a real data of sent- and receive-mails [24] 
or from the generalized BA model [3] [16]. The dynamic 
behavior is the same in both simulations for a realistic 
stochastic SHIR model and a mean-field approximation 
without the connectivity correlations for the macroscopic 
equations of simpler deterministic SIR models. The ob- 
tained results suggest that the recoverable prevalence 
stems from the growth of network, it is bifurcated from 
the extinction state according to the relations of growth, 
infection, and immune rates. Moreover, the targeted im- 
munization for hubs is effective even in the growing sys- 
tem. Quantitative fitness with really observed virus data 
and more detail analysis with the correlations are further 
studies. 
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